Back

Trends in Hearing

SAGE Publications

Preprints posted in the last 30 days, ranked by how well they match Trends in Hearing's content profile, based on 12 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Discrimination of spectrally sparse complex-tone triads in cochlear implant listeners

Augsten, M.-L.; Lindenbeck, M. J.; Laback, B.

2026-03-24 neuroscience 10.64898/2026.03.20.712905 medRxiv
Top 0.1%
6.8%
Show abstract

Cochlear implant (CI) users typically experience difficulties perceiving musical harmony due to a restricted spectro-temporal resolution at the electrode-nerve interface, resulting in limited pitch perception. We investigated how stimulus parameters affect discrimination of complex-tone triads (three-voice chords), aiming to identify conditions that maximize perceptual sensitivity. Six post-lingually deafened CI listeners completed a same/different task with harmonic complex tones, while spectral complexity, voice(s) containing a pitch change, and temporal synchrony (simultaneous vs. sequential triad presentation) were manipulated. CI listeners discriminated harmonically relevant one-semitone pitch changes within triads when spectral complexity was reduced to three or five components per voice, with significantly better performance for three-component compared to nine-component tones. Sensitivity was observed for pitch changes in the high voice or in both high and low voices, but not for changes in only the low voice. Single-voice sensitivity predicted simultaneous-triad sensitivity when controlling for spectral complexity and voice with pitch change. Contrary to expectations, sequential triad presentation did not improve discrimination. An analysis of processor pulse patterns suggests that difference-frequency cues encoded in the temporal envelope rather than place-of-excitation cues underlie perceptual triad sensitivity. These findings support reducing spectral complexity to enhance chord discrimination for CI users based on temporal cues.

2
Can Multimodal Large Language Models Visually Interpret Auditory Brainstem Responses?

Jedrzejczak, W.; Kochanek, K.; Skarzynski, H.

2026-04-17 otolaryngology 10.64898/2026.04.15.26350944 medRxiv
Top 0.1%
6.4%
Show abstract

Introduction: Auditory brainstem response (ABR) is a standard objective method for estimating hearing threshold, especially in patients who cannot reliably participate in behavioral audiometry. However, ABR interpretation is usually performed by an expert. This study evaluated whether two general-purpose artificial intelligence (AI) multimodal large language model (LLM) chatbots, ChatGPT and Qwen, can accurately estimate ABR hearing thresholds from ABR waveform images. The accuracy was measured by comparisons with the judgements of 3 expert audiologists. Methods: A total of 500 images each containing several ABR waveforms recorded at different stimulus intensities were analyzed. Three expert audiologists established the reference auditory thresholds based on visual identification of wave V at the lowest stimulus intensity, with the most frequent judgment among the three used as the reference. Each waveform image was independently submitted to ChatGPT (version 5.1) and Qwen (version 3Max) using the same standardized prompt and without additional clinical context. Agreement with the expert thresholds was assessed as mean errors and correlations. Sensitivity and specificity for detecting hearing loss (>20 dB nHL) were also calculated. In cases where the AI and expert thresholds nominally matched, corresponding latency measures were also compared. Results: Auditory thresholds derived from both LLMs correlated strongly with expert opinion, with Pearson r = 0.954 for ChatGPT and r = 0.958 for Qwen. ChatGPT showed a mean error of +5.5 dB and Qwen showed a mean error of -2.7 dB. Exact nominal agreement with expert values was achieved in 34.6% of ChatGPT estimates and 35.6% of Qwen estimates; agreement within +/-10 dB was observed in 75.6% and 80.0% of cases, respectively. For hearing-loss classification, ChatGPT achieved 100% sensitivity but low specificity (20.4%), whereas Qwen showed a more balanced profile with 91.6% sensitivity and 67.5% specificity. Curiously, estimates of wave V latency were markedly poor for both LLMs, with systematic underestimation and weak correlations with the expert judgements. Conclusion: ChatGPT and Qwen demonstrated a moderate ability to estimate ABR thresholds from waveform images, although their performance was not good enough for independent clinical use. Both models captured general patterns of hearing loss severity, but there was systematic bias, limited specificity and sensitivity balance, and poor latency estimation. General-purpose multimodal LLMs may have potential as assistive or preliminary tools, but clinically reliable ABR interpretation will likely require specialized, domain-trained AI systems with expert oversight.

3
Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling

Neely, S. T.; Harris, S. E.; Hajicek, J. J.; Petersen, E. A.; Shen, Y.

2026-04-01 neuroscience 10.64898/2026.03.30.715393 medRxiv
Top 0.1%
4.8%
Show abstract

In a loudness-matching paradigm, a reduction in the loudness of sounds with bandwidths less than one-half octave compared to a tone of equal sound pressure level has been observed previously for five-tone complexes at 60 dB SPL centered at 1 kHz. Here, this loudness-reduction phenomenon is explored using band-limited noise across wide ranges of frequency and level. Additionally, these measurements are simulated by a model of loudness judgement based on neural ensemble averaging (NEA), which serves as a proxy for central auditory signal processing. Multi-frequency equal-loudness contours (ELC) were measured for each of the adult participants (N=100) with pure-tone average (PTA) thresholds that ranged from normal to moderate hearing loss using a categorical-loudness-scaling (CLS) paradigm. Presentation level and center frequency of the test stimuli were determined on each trial according to a Bayesian adaptive algorithm, which enabled multi-frequency ELC estimation within about five minutes of testing. Three separate test conditions differed by stimulus type: (1) pure-tone, (2) quarter-octave noise and (3) octave noise. For comparison, loudness judgements for all three stimulus types were also simulated by the NEA model, which comprised a nonlinear, active, time-domain cochlear model with an appended stage of neural spike generation. Mid-bandwidth loudness reduction was observed to be greatest at moderate stimulus levels and frequencies near 1 kHz. This feature was approximated by the NEA model, which suggests involvement of an early stage of the central auditory system in the formation of loudness judgements.

4
Improving Automated Diagnosis of Middle and Inner Ear Pathologies by Estimating Middle Ear Input Impedance from Wideband Tympanometry

Kamau, A. F.; Merchant, G. R.; Nakajima, H. H.; Neely, S. T.

2026-03-31 otolaryngology 10.64898/2026.03.26.26349034 medRxiv
Top 0.1%
3.6%
Show abstract

Conductive hearing loss (CHL) with a normal otoscopic exam can be difficult to diagnose because routine clinical measures such as audiometric air-bone gaps (ABGs) can identify a conductive component but often cannot distinguish among specific underlying mechanical pathologies (e.g., stapes fixation versus superior canal dehiscence, which may produce similar audiograms). Wideband tympanometry (WBT) is a fast, noninvasive test that can provide additional mechanical information across a broad range of frequencies (200 Hz to 8 kHz). However, WBT metrics are influenced by variations in ear canal geometry and probe placement and can be challenging to interpret clinically. In this study, we extend prior WBT absorbance-based classification work by estimating the middle ear input impedance at the tympanic membrane (ZME), a WBT-derived metric intended to reduce ear canal effects. To estimate ZME, we fit an analog circuit model of the ear canal, middle ear, and inner ear to raw WBT data collected at tympanometric peak pressure (TPP). Data from 27 normal ears, 32 ears with superior canal dehiscence, and 38 ears with stapes fixation were analyzed. A multinomial logistic regression classifier was trained using principal component analysis (retaining 90% variance) and stratified 5-fold cross-validation with regularization. We compared feature sets based on ABGs alone, ABGs combined with absorbance, and ABGs combined with the magnitude of ZME. The combination of ABGs and the magnitude of ZME produced the best performance, achieving an overall accuracy of 85.6% compared to 80.4% for ABGs alone and 78.4% for ABGs combined with absorbance. These results suggest that incorporating model-derived middle ear impedance features with standard audiometric measures (ABGs) can improve automated pathology classification for stapes fixation and superior canal dehiscence.

5
Speech-in-Noise Difficulties in Aminoglycoside Ototoxicity Reflects Combined Afferent and Efferent Dysfunction

Motlagh Zadeh, L.; Izhiman, D.; Blankenship, C. M.; Moore, D. R.; Martin, D. K.; Garinis, A.; Feeney, P.; Hunter, L. R.

2026-03-26 otolaryngology 10.64898/2026.03.23.26348719 medRxiv
Top 0.1%
2.7%
Show abstract

Objectives: Patients with Cystic fibrosis (CF) often receive aminoglycosides (AGs) to manage recurrent pulmonary infections, placing them at risk for ototoxicity. Chronic AG use can lead to complex cochlear damage affecting inner and outer hair cells, the stria vascularis, and spiral ganglion neurons. The greatest damage is typically in the basal cochlear region, which encodes high-frequency hearing, with additional involvement of more apical regions. While extended-high-frequency (EHF) hearing loss (EHFHL; 9-16 kHz) is often the earliest sign of AG ototoxicity, speech in noise (SiN) effects are rarely studied. Our overall hypothesis is that SiN perception difficulties in individuals with CF, treated with AGs, are related to combined cochlear and neural damage, primarily in the EHF range but also in the standard frequency (SF; 0.25-8 kHz) range. Three mechanisms that contribute to SiN perception were evaluated in children and young adults: 1) a primary effect of reduced EHF sensitivity, measured by pure-tone audiometry (PTA) and transient-evoked otoacoustic emissions (TEOAEs); 2) a secondary effect of subclinical damage in the SF range, measured by PTA and TEOAEs; and 3) additional neural effects, measured by middle ear muscle reflex (MEMR) threshold (afferent) and growth functions (efferent).Design:A total of 185 participants were enrolled; 101 individuals with CF treated with intravenous AGs and 84 age and sex-matched Controls without hearing concerns or CF. Assessments included EHF and SF PTA; the Bamford-Kowal-Bench (BKB)-SIN test for SiN perception; double-evoked TEOAEs with chirp stimuli from 0.71 to 14.7 kHz; and ipsilateral and contralateral wideband MEMR thresholds and growth functions using broadband stimuli. Results: Reduced sensitivity at EHFs (PTA, TEOAEs) was not associated with impaired SiN perception in the CF group. SF hearing, regardless of EHF status, was the primary predictor of SiN performance in the CF group. Increased MEMR growth was also significantly associated with poorer SiN in the CF group. Conclusions: In CF, impaired SiN perception was primarily predicted by SF hearing impairment, with additional involvement of the efferent auditory pathway through increased MEMR growth. These results build on prior evidence for efferent neural effects due to ototoxic exposures, supporting both sensory (afferent) and neural (efferent) mechanisms that contribute to listening difficulties in CF. Thus, preventive and intervention strategies should consider these combined mechanisms in people with AG ototoxicity to address their SiN problems.

6
Acoustic Salience Drives Pupillary Dynamics in an Interrupted, Reverberant Task

Figarola, V.; Liang, W.; Luthra, S.; Parker, E.; Winn, M.; Brown, C.; Shinn-Cunningham, B. G.

2026-04-02 neuroscience 10.64898/2026.03.31.715639 medRxiv
Top 0.1%
1.9%
Show abstract

Listeners face many challenges when trying to maintain attention to a target source in everyday settings; for instance, reverberation distorts acoustic cues and interruptions capture attention. However, little is known about how these challenges affect the ability to maintain selective attention. Here, we measured syllable recall accuracy and pupil dilation during a spatial selective attention task that was sometimes disrupted. Participants heard two competing, temporally interleaved syllable streams presented in pseudo-anechoic or reverberant environments. On randomly selected trials, a sudden interruption occurred mid-sequence. Compared to anechoic trials, reverberant performance was worse overall, and the interrupter disrupted performance. In uninterrupted trials, reverberation reduced peak pupil dilation both when it was consistent across all stimuli in a block and when it was randomized trial to trial, suggesting temporal smearing reduced clarity of the scene and the salience of events in the ongoing streams. Pupil dilations in response to interruptions indicated perceptual salience was strong across reverberant and anechoic conditions. Specifically, baseline pupil size before trials did not vary across room conditions, and mixing or blocking of trials (altering stimulus expectations) had no impact on pupillary responses. Together, these findings highlight that stimulus salience drives cognitive load more strongly than does task performance.

7
A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

Akinniyi, S.; Jain-Poster, K.; Evangelista, E.; Yoshikawa, N.; Rivero, A.

2026-04-15 otolaryngology 10.64898/2026.04.14.26350677 medRxiv
Top 0.1%
1.0%
Show abstract

ObjectiveThe objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study DesignComparative study SettingsInternet MethodsA sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including "hearing loss," "ear infection," "tinnitus," "ear pain," and "vertigo." Posts were retrieved using Reddits "Top" filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. ResultsCommon otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). ConclusionLLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.

8
Investigating neural speech processing with functional near infrared spectroscopy: considerations for temporal response functions

Wilroth, J.; Sotero Silva, N.; Tafakkor, A.; de Avo Mesquita, B.; Ip, E. Y. J.; Lau, B. K.; Hannah, J.; Di Liberto, G. M.

2026-03-23 neuroscience 10.64898/2026.03.20.713212 medRxiv
Top 0.1%
0.8%
Show abstract

Functional near infrared spectroscopy (fNIRS) is increasingly used in hearing and communication research, with advantages such as robustness to movement artifacts, improved spatial resolution, and flexibility of contexts in which it can be applied. At the same time, the field is progressively moving towards more continuous, naturalistic listening paradigms resulting in the widespread adoption of speech tracking analyses such as temporal response functions (TRFs) in electroencephalography (EEG) and magnetoencephalography (MEG) studies. However, it remains unclear whether these analyses can be applied to slower haemodynamic signals measured by fNIRS. In the present study, we investigated whether a TRF framework can similarly be applied to fNIRS data recorded during continuous speech perception. Eight participants listened to speech simultaneously while fNIRS signals were acquired in a hyperscanning setup. Speech features were regressed onto the haemodynamic responses to test the feasibility and interpretability of fNIRS-based TRFs. Prediction correlations between observed and modelled fNIRS signals across speech features were higher than those typically reported for EEG- and comparable to those reported for MEG-TRF studies. Moreover, these correlations did not overlap with a null distribution generated from triallJmismatched fNIRS data, confirming statistical significance and were slightly greater than those obtained from a conventional GLM approach. Our findings support that TRF estimation method can yield meaningful and statistically significant responses from fNIRS data. HighlightsO_LITRF modelling can be meaningfully applied to fNIRS data acquired during speech listening tasks. C_LIO_LIPrediction correlations between actual and modelled fNIRS signals were above chance level, with values comparable to previous EEG/MEG studies. C_LIO_LITRFs explained more fNIRS variance than a conventional GLM approach. C_LI

9
Hearing sounds when the eyes move: A case study implicating the tensor tympani in eye movement-related peripheral auditory activity

King, C. D.; Zhu, T.; Groh, J. M.

2026-03-25 neuroscience 10.64898/2026.03.24.713974 medRxiv
Top 0.1%
0.7%
Show abstract

Information about eye movements is necessary for linking auditory and visual information across space. Recent work has suggested that such signals are incorporated into processing at the level of the ear itself (Gruters, Murphy et al. 2018). Here we report confirmation that the eye movement signals that reach the ear can produce perceptual consequences, via a case report of an unusual participant with tensor tympani myoclonus who hears sounds when she moves her eyes. The sounds she hears could be recorded with a microphone in the ear in which she hears them (left), and occurred for large leftward eye movements to extreme orbital positions of the eyes. The sounds elicited by this participants eye movements were reminiscent of eye movement-related eardrum oscillations (EMREOs, (Gruters, Murphy et al. 2018, Brohl and Kayser 2023, King, Lovich et al. 2023, Lovich, King et al. 2023, Lovich, King et al. 2023, Abbasi, King et al. 2025, Sotero Silva, Kayser et al. 2025, King and Groh 2026, Leon, Ramos et al. 2026, Sotero Silva, Brohl et al. 2026)), but were larger and longer lasting than classical EMREOs, helping to explain why they were audible to her. Overall, the observations from this patient help establish that (a) eye movement-related signals specifically reach the tensor tympani muscle and that (b) when there is an abnormality involving that muscle, such signals can lead to actual audible percepts. Given that the tensor tympani contributes to the regulation of sound transmission in the middle ear, these findings support that eye movement signals reaching the ear have functional consequences for auditory perception. The findings also expand the types of medical conditions that produce gaze-evoked tinnitus, to date most commonly observed in connection with acoustic neuromas.

10
BioDCASE: Using data challenges to make community advances in computational bioacoustics

Stowell, D.; Nolasco, I.; McEwen, B.; Vidana Vila, E.; Jean-Labadye, L.; Benhamadi, Y.; Lostanlen, V.; Dubus, G.; Hoffman, B.; Linhart, P.; Morandi, I.; Cazau, D.; White, E.; White, P.; Miller, B.; Nguyen Hong Duc, P.; Schall, E.; Parcerisas, C.; Gros-Martial, A.; Moummad, I.

2026-04-06 animal behavior and cognition 10.64898/2026.04.02.716062 medRxiv
Top 0.2%
0.5%
Show abstract

Computational bioacoustics has seen significant advances in recent decades. However, the rate of insights from automated analysis of bioacoustic audio lags behind our rate of collecting the data - due to key capacity constraints in data annotation and bioacoustic algorithm development. Gaps in analysis methodology persist: not because they are intractable, but because of resource limitations in the bioacoustics community. To bridge these gaps, we advocate the open science method of data challenges, structured as public contests. We conducted a bioacoustics data challenge named BioDCASE, within the format of an existing event (DCASE). In this work we report on the procedures needed to select and then conduct useful bioacoustics data challenges. We consider aspects of task design such as dataset curation, annotation, and evaluation metrics. We report the three tasks included in BioDCASE 2025 and the resulting progress made. Based on this we make recommendations for open community initiatives in computational bioacoustics.

11
Individualised evoked response detection based on the spectral noise colour

Undurraga Lucero, J. A.; Chesnaye, M.; Simpson, D.; Laugesen, S.

2026-04-13 health informatics 10.64898/2026.04.11.26350685 medRxiv
Top 0.2%
0.4%
Show abstract

Objective detection of evoked potentials (EPs) is central to digital diagnostics in hearing assessment and clinical neurophysiology, yet current approaches remain time-intensive and sensitive to inter-individual noise variability. Many existing detection methods rely on population-based assumptions or computationally demanding procedures, limiting robustness and efficiency in real-world clinical settings. We present Fmpi, a digital EP detection framework enabling individualised, real-time response detection through analytical modelling of the spectral colour and temporal dynamics of background noise within each recording. Using extensive simulations and large-scale human electroencephalography datasets spanning brainstem, steady-state, and cortical EPs recorded in adults and infants, we demonstrate performance comparable or superior to state-of-the-art bootstrapped methods while operating at a fraction of the computational cost and maintaining well-controlled sensitivity with improved specificity. Importantly, Fmpi incorporates a futility detection mechanism enabling early termination of uninformative recordings, reducing testing time without compromising diagnostic reliability.

12
Transformer Language Models Reveal Distinct Patterns in Aphasia Subtypes and Recovery Trajectories

Ahamdi, S. S.; Fridriksson, J.; Den Ouden, D.

2026-03-27 neuroscience 10.64898/2026.03.27.714240 medRxiv
Top 0.2%
0.3%
Show abstract

Language impairments in aphasia are characterized by various representational disruptions that may be reflected in discourse production. This research examines the capacity of transformer-based language models, particularly GPT-2, to serve as a computational framework for analyzing variations in aphasic narrative speech. A longitudinal dataset of narrative speech samples collected at six time points from individuals with aphasia (N = 47) was utilized as part of an intervention study. All transcripts were processed via the GPT-2 language model to obtain activation values from each of the 12 transformer layers. Statistically significant differences in activation magnitude across aphasia subtypes were found at every layer (all p < .001), with the most pronounced effects in the deeper layers. Pairwise Tukey HSD tests revealed consistent distinctions between Brocas aphasia and both Anomic and Wernickes aphasia, suggesting a shared activation profile between the latter two. Longitudinal tests revealed significant changes over time, especially in the final three layers (10-12). These findings suggest that transformer-based activation patterns reflect meaningful variation in aphasic discourse and could complement current diagnostic tools. Overall, GPT-2 provides a scalable tool to model representational dynamics in aphasia and enhance the clinical interpretability of deep language models.

13
Linguistic and Acoustic Biomarkers from Simulated Speech Reveal Early Cognitive Impairment Patterns in Alzheimers Disease

Debnath, A.; Sarkar, S.

2026-04-08 neuroscience 10.64898/2026.04.08.717162 medRxiv
Top 0.2%
0.3%
Show abstract

BackgroundAlzheimers disease (AD) causes progressive decline in language and cognition. Automated speech analysis has emerged as a promising screening tool, yet clinical data scarcity limits progress. To address this, we generated a large-scale simulated speech dataset to model linguistic and acoustic deterioration across cognitive stages, Control, Mild Cognitive Impairment (MCI), and AD. MethodsUsing Monte Carlo simulations, we emulated the Pitt DementiaBank "Cookie Theft" narratives. Acoustic features (speech rate, pause duration, jitter, shimmer) and linguistic features (type-token ratio, unique-word count, filler usage) were synthetically sampled from real-world DementiaBank distributions. We trained an XGBoost classifier to distinguish diagnostic groups, and applied SHAP (Shapley Additive exPlanations) to assess feature importance. ResultsThe model achieved high discriminative performance (AUC {approx} 0.94; accuracy {approx} 85%). Compared to controls, simulated MCI and AD groups showed progressive declines in fluency and lexical diversity, and increases in disfluencies and voice instability. SHAP analysis revealed that key predictors included reduced type-token ratio, higher pause and filler rates, and elevated jitter/shimmer. Classification was most accurate for Control vs. AD; MCI misclassifications highlighted intermediate profiles. InterpretationOur framework, FMN (Forget Me Not), captures clinically relevant speech changes using simulated data, offering an explainable and scalable approach for cognitive screening. While not a substitute for real datasets, FMN validates a pipeline that mirrors known AD markers and can guide future real-world deployments. External validation remains a key next step for translational impact.

14
Human brains implicitly and rapidly distinguish AI from human voices before decoding prosodic meaning

Chen, W.; Pell, M.; Jiang, X.

2026-04-09 neuroscience 10.64898/2026.04.08.716483 medRxiv
Top 0.3%
0.1%
Show abstract

People encounter AI voices daily. Existing behavioral studies suggest listeners rely on prosodic cues such as intonation and expressiveness to detect audio deepfakes, reporting that AI voices sound prosodically less rich than human voices. To test whether prosodic processing drives deepfake discrimination in the neural time course of voice processing, we recorded electroencephalographic (EEG) data while participants listened to human and AI-generated speakers producing utterances in confident vs. doubtful prosody (tone of voice), with attention directed toward memorizing speaker names. We used voice cloning to control for speaker identity confounds between human and AI voices. Multivariate pattern analysis revealed that neural discrimination of human vs. AI voices emerged rapidly regardless of prosody (confident: 176 ms; doubtful: 134 ms), substantially preceding prosody discrimination (confident vs. doubtful within human voices: 2066 ms; within AI voices: 1366 ms). Acoustic analysis confirmed that prosodic distinctions became classifiable only at utterance offset (90% normalized duration), converging with neural evidence that prosody requires near-complete temporal integration. This temporal dissociation between rapid voice source discrimination and late-emerging prosody decoding suggests that prosody plays a smaller role in audio deepfake detection than listeners retrospectively report. Representational similarity analysis further revealed that spectral envelope features (mel-frequency cepstral coefficients; MFCC), rather than the visually salient high-frequency energy differences, drove neural human-AI discrimination, with MFCCs earliest independent contribution (228 ms) closely following the MVPA decoding onset (134-176 ms). Future studies may manipulate specific acoustic components to establish the causal sources of this rapid and sustained neural discrimination. Significance StatementPeople encounter AI voices daily, in phone calls, navigation apps, supermarket checkouts, and subway announcements. Using electroencephalography, we show that the human brain automatically and rapidly distinguishes everyday AI voices from human speech, even without conscious attention to voice source. Although people may attribute this ability to AI voices sounding monotone or prosodically unnatural, the brain relies on subtler acoustic signatures, enabling discrimination before prosodic information becomes available. Attempts to identify the specific acoustic features driving this neural detection were inconclusive, pointing to the need for future causal investigations. We encourage engineers and policymakers to ensure AI voices remain perceptually detectable, as increasingly humanlike AI voices could cognitively disadvantage the general public if they become indistinguishable from human speech.

15
Sparse Stimulus Generation Improves Reverse Correlation Efficiency and Interpretability

Gargano, J. A.; Rice, A.; Chari, D. A.; Parrell, B.; Lammert, A. C.

2026-03-26 neuroscience 10.64898/2026.03.24.714012 medRxiv
Top 0.3%
0.1%
Show abstract

Reverse correlation is a widely-used and well-established method for probing latent perceptual representations in which subjects render subjective preference responses to ambiguous stimuli. Stimuli are purposefully designed to have no direct relationship with the target representation (e.g., they are randomly-generated), a property which makes each individual stimulus minimally informative toward reconstructing the target, and often difficult to interpret for subjects. As a result, a large number of stimulus-response pairs must be gathered from a given subject in order for reconstructions to be of sufficient quality, making the task fatiguing. Recent work has demonstrated that the number of trials needed can be substantially reduced using a compressive sensing framework that incorporates the assumption that the target representation can be sparsely represented in some basis into the reconstruction process. Here, we introduce an alternative method that incorporates the sparsity assumption directly into stimulus generation, which holds promise not only for improving efficiency, but also for improving the interpretability of stimuli from subjects perspective. We develop this new method as a mathematical variation of the compressive sensing approach, before conducting one simulation study and two human subjects experiments to assess the benefits of this method to reconstruction quality, sample size efficiency, and subjective interpretability. Results show that sparse stimulus generation improves all three of these areas relative to conventional reverse correlation approaches, and also relative to compressive sensing in most conditions.

16
Domain Specific Functional Plasticity of Visual Processing Constrained by General Cognitive Ability in Deaf Individuals

Dong, C.; Wang, Z.; Zuo, X.; Wang, S.

2026-03-26 neuroscience 10.64898/2026.03.25.714101 medRxiv
Top 0.3%
0.1%
Show abstract

Interpersonal communication relies on integrating facial and vocal signals to extract multidimensional communicative information. How the absence of audition reshapes the communicative system remains unclear. We compared the performance of deaf (N=136) and hearing (N=135) adults across multiple domains, facial identity, emotional expression, speech, and global motion, through a series of unisensory and audiovisual psychophysical tasks. The results showed that, in hearing individuals, reliance on facial versus vocal signals differed across domains. In deaf individuals, auditory deprivation did not produce uniform enhancement or impairment of visual processing. Instead, they exhibited reduced sensitivity to dynamic emotional expressions and global motion, preserved sensitivity to facial identity (both static and dynamic) and static expressions, and enhanced categorization of facial speech. Notably, sensitivity to dynamic facial expressions and global motion was correlated, and both were explained by variations in fluid intelligence. Our results provide a systematic characterization of visual function across domains in deaf individuals, suggesting that the consequences of hearing loss are shaped both by the functional roles of audition within each domain and by broader cognitive adaptations. These findings advance understanding of cross-modal plasticity and inform the development of targeted ecologically valid accessibility and sensory-substitution strategies.

17
Rites of Passage: Professional Identity Formation and the OTOHNS Oral Board Exam

McMains, K.

2026-03-19 otolaryngology 10.64898/2026.03.19.26347858 medRxiv
Top 0.4%
0.0%
Show abstract

ObjectivesProfessional Identity Formation has been defined as an individual internalizing the values and norms of the medical profession in ways that result in thinking, acting, and feeling like a physician. During the COVID-19 Pandemic, the ABOHNS pivoted the format of the oral board exam from in-person exams to virtually administered exams. In light of this, we ask: O_LIHow, if at all, do Otolaryngology-Head and Neck Surgery Oral Board Examinations shape examinee professional identity? C_LIO_LIDo different formats of administering Otolaryngology-Head and Neck Surgery Oral Board Examinations have different effects on examinee Professional Identity Formation (PIF)? C_LI MethodsThematic analysis was used to explore candidate experience. We developed and tested a shortened Professional Identity Essay that foregrounds the PIF effects resulting from differing methods of administering the Oral Board Examination. Themes generated from semi-structured interviews were compared to identify differences Professional Identity resulting from OBEs. ResultsNineteen participants enrolled in our study, each completing a single interview lasting between 15-30 minutes. We found participants responses to coalesce around 3 themes: educational effect of the OBE on PIF; different OBE formats carried distinct stresses; and the catalytic effect on PIF from in-person OBE. ConclusionsParticipating in either format of the ABOHNS OBE demonstrated and educational effect on PIF. Additionally, when delivered in an in-person format, the ABOHNS OBE also catalyzed ongoing PIF. This effect of the OBE offers an additional potent mechanism to integrate the most inclusive range of candidates into the community of Otolaryngology practice. Level of Evidence: VI(Single Qualitative Study investigating perspectives of healthcare providers on a specific intervention)

18
Deficits in tail-lift and air-righting reflexes in rats after ototoxicity associate with loss of vestibular type I hair cells

Palou, A.; Tagliabue, M.; Beraneck, M.; Llorens, J.

2026-03-26 neuroscience 10.64898/2026.03.24.712950 medRxiv
Top 0.4%
0.0%
Show abstract

The rat vestibular system plays a critical role in anti-gravity responses such as the tail-lift reflex and the air-righting reflex. In a previous study in male rats, we obtained evidence that these two reflexes depend on the function of non-identical populations of vestibular sensory hair cells (HC). Here, we caused graded lesions in the vestibular system of female rats by exposing the animals to several different doses of an ototoxic chemical, 3,3-iminodipropionitrile (IDPN). After exposure, we assessed the anti-gravity responses of the rats and then assessed the loss of type I HC (HCI) and type II HC (HCII) in the central and peripheral regions of the crista, utricle and saccule. As expected, we recorded a dose-dependent loss of vestibular function and loss of HCs. The relationship between hair cell loss and functional loss was examined using non-linear models fitted by orthogonal distance regression. The results indicated that both the tail-lift reflex and the air-righting reflexes mostly depend on HCI function. However, a different dependency was found on the epithelium triggering the reflex: while the tail-lift response is sensitive to loss of crista and/or utricle HCIs, the air-righting response rather depends on utricular and/or saccular integrity.

19
Vestibular Function Loss Associates With Sensory Epithelium Pathology In Vestibular Schwannoma Patients

Borrajo, M.; Callejo, A.; CASTELLANOS, E.; Amilibia, E.; Llorens, J.

2026-03-25 neuroscience 10.64898/2026.03.23.713132 medRxiv
Top 0.4%
0.0%
Show abstract

Vestibular schwannomas (VS) cause vestibular function loss by mechanisms still poorly understood. We evaluated the vestibulo-ocular reflex by the video-assisted Head Impulse Test (vHIT) in patients with planned tumour resection by a trans-labyrinthine approach. The vestibular sensory epithelia were collected and processed by immunofluorescent labelling for confocal microscopy analysis of sensory hair cell subtypes (type I, HCI, and type II, HCII), calyx endings of the pure-calyx afferents, and the calyceal junction normally found between HCI and the calyx (n=23). Comparing Normofunction and Hypofunction patients, we concluded that worse vestibular function associates with decreased HCI and HCII counts in the sensory epithelia and with increased proportion of damaged calyces. A decrease in the number of HCI and calyx endings of the pure-calyx afferents was recorded to associate with age increase. Partial least squares regression (PLSR) models indicated that VS and age had independent, additive effects on vestibular function. Correlation analyses indicated that lower vHIT gains associate with lower numbers of HCI and increased percentages of damaged calyces. These data support the hypothesis that the deleterious effect of VS on vestibular function is mediated, at least in part, by its damaging impact on the vestibular sensory epithelium. They also provide further evidence for the dependency of the vestibulo-ocular reflex on HCI function and for the calyceal junction pathology as a common response of the sensory epithelium to HC stress.

20
Negative emotional visual stimuli alter specific improvised dance biomechanics in professional dancers

Maracia, B. C. B.; Souza, T. R.; Oliveira, G. S.; Nunes, J. B. P.; dos Santos, C. E. S.; Peixoto, C. B.; Lopes-Silva, J. B.; Nobrega, L. A. O. d. A.; Araujo, P. A. d.; Souza, R. P.; Souza, B. R.

2026-03-20 neuroscience 10.64898/2026.03.18.711707 medRxiv
Top 0.4%
0.0%
Show abstract

Dance is a core form of human-environment interaction and a powerful medium for emotional expression, yet dancers are routinely exposed to environmental affective cues that may shape their movement. We tested whether a negative emotional context induced immediately before improvisation alters dance biomechanics. Twenty professional dancers performed two 3-min improvised dances. Between dances, they viewed either Neutral or Negatively valenced pictures from the International Affective Picture System (IAPS; 2 min 40 s, 5 s per image). Eye tracking verified attention to the visual stream. Mood was assessed at four time points (PT1-PT4) using the Brazilian Mood Scale (BRAMS), and full-body, three-dimensional kinematics were captured at 300 Hz using a 9-camera optoelectronic system (Qualisys) and processed to measure global movement amplitude and expansion. Negative IAPS exposure increased tension, depression, fatigue, and decreased vigor from PT2 to PT3. Biomechanically, the Negative Stimulus dancers showed a significant reduction in global movement amplitude after negative IAPS exposure, with reduced movement amplitude of the body extremities. In contrast, global movement expansion remained unchanged; that is, the extremities were not positioned closer or farther from the pelvis. Neutral images produced no mood change and no measurable modulation of movement amplitude or expansion. Together, these results support the hypothesis that improvised dance carries biomechanical signatures of the dancers current affective state, beyond the intended expressive content, and provide an automated motion-capture workflow for studying emotion-movement coupling in spontaneous dance. HighlightsNegative visual context shifted dancers mood toward negative affect Negative images reduced movement amplitude in improvised dance Movement expansion remained stable despite mood induction Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=113 SRC="FIGDIR/small/711707v1_ufig1.gif" ALT="Figure 1"> View larger version (19K): org.highwire.dtl.DTLVardef@aeaacdorg.highwire.dtl.DTLVardef@14f9bf5org.highwire.dtl.DTLVardef@18805fcorg.highwire.dtl.DTLVardef@1411256_HPS_FORMAT_FIGEXP M_FIG C_FIG